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Preface 


This book of proceedings gathers the contributions presented at the 3rd 
URV Doctoral Workshop in Computer Science and Mathematics. After the 
successful previous editions in 2014 and 2015, the third edition has been held 
in Tarragona (Catalonia, Spain) on November 17th, 2016. It has been jointly 
organized by the research group Algorithms Embedded in Physical Systems 
(ALEPHSYS) and the Doctoral Program on Computer Science and Mathe- 
matics of Security of Universitat Rovira i Virgili (URV). The main aim of this 
workshop is to promote the dissemination of the ideas, methods and results 
that are developed in the Doctoral Thesis of the students of this doctorate 
program, and to promote the knowledge, collaboration and discussion between 
their respective research groups. 

The workshop had two invited talks and twelve oral presentations. The 
first invited talk was given Prof. Jordi Garcfa-Ojalvo, the leader of the Dy- 
namical Systems Biology Lab at Universitat Pompeu Fabra, who talked about 
how models can be applied to understand life, from gene circuits to neural 
networks. The second invited talk was given by Prof. Jordi Vitria, coordinator 
of the BCN Perceptual Computing Lab at Universitat de Barcelona. He pro- 
vided an overview and practical hints of one of the currently most important 
topics in Artificial Intelligence: Deep Learning. 

In this book, the reader will find the contributions of the Ph.D. students. 
Each chapter presents the research topic of one student, the goals and some 
of the results. It is worth to note the wide coverage of this workshop, with 
contributions to the following main research lines: (1) Security and privacy in 
computer systems, (2) Artificial intelligence, robotics and vision, (3) Telematic 
architectures and complex networks, and (4) Mathematics. All contributions 
present innovative proposals, methods or applications, with the aim of opening 
new and strategic research lines. 

The editors and organizers invite you to contact the authors for more 
detailed explanations and we encourage you to send them your suggestions and 
comments that may certainly help them in the next steps of their PhD thesis. 
The organizing committee was formed by Dr. Sergio Gomez, Dr. Aida Valls 


(Coordinator of the Ph.D. program), Mr. Joan T. Matamalas and Mrs. Olga 
Segu. 

We could not finish without first thanking the invited speakers for ac- 
cepting to contribute and for giving us such interesting conferences. Second, 
we thank all the participants and, especially, the students that presented 
their work in this DCSM workshop. Finally, we also want to thank Uni- 
versitat Rovira i Virgili (URV), the Departament d’Enginyeria Informatica i 
Matematiques (DEIM), and the Escola Técnica Superior d’Enginyeria (ETSE) 
for their support. 


Sergio Gémez and Aida Valls (Editors) 
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1 Introduction 


In a metric space (I/,d) we say that an element a € M distinguishes 
other two b,c € M if d(a,b) 4 d(a,c). B C M is a metric generator for 
M if every pair of elements in M is distinguished by some element in B. 
The metric dimension of the space is defined as the minimum number that 
can be cardinality of a generator set. The notion of metric dimension was 
introduced by Blumenthal in [1] in 1953 and in the context of graph theory 
independently by Harary and Melter [2] and Slater [3] in 1975. We can define 
a metric on an graph considering the distance between two vertices as the 
length of the shortest path between them. In 1996 Khuller et al. [4] proved 
that the decision problem of determining whether the metric dimension is less 
than a given value, is NP-complete. My thesis is focused in a variant of the 
metric dimension, the Local Metric Dimension introduced by Okamoto et al. 
in 2010 [5]. When we calculate the local metric dimension of a graph we are 
concerned only to distinguish pairs of adjacent vertices. The calculation of the 
local metric dimension is also proved to be a NP-Hard problem by Rodriguez- 
Velazquez and Fernau in 2014 [6]. The aim of the thesis is to tackle the problem 
of calculating the local metric dimension of graphs that are obtained from 
simpler ones by graph operations called in general graph products. For a 
detailed study of such products we refer to [9]. 


2 Basic definitions and results 


Let G(V, F) be a graph, non oriented and without multiple edges. For a pair 
of vertices u,v € V we denote by dg(u,v) the length of a shortest u, v-path. 
It is clear that (G, dg) is a metric space. B C V is a local metric generator for 
G if for every uv € E there exists x € B such that dg(x,u) 4 dg(x,v). The 
minimum r such that r can be the cardinality of a local metric generator is 
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called the local metric dimension of G that is denoted by dim;(G) and a local 
metric generator with that cardinal is called a local metric basis for G. 
Okamoto’s et. al. article [5] is rich in results. Among them 


Theorem 1. Jf GOH denotes the Cartesian product of the graphs G and H 
then 


dim;(GOH) = max{dim;(G), dim;(H)}. 


That is the origin of our research. We study the local metric dimension in 
the strong product, the leixcographic product, the corona product of graphs, in 
the graphs obtained by point attaching and also the simultaneous local metric 
dimension of families of lexicographic product graphs. 

As an illustration of our work we will focus on our last article that is about 
the calculation of the local metric dimension of lexicographic product graphs 
[10]. 


2.1 The lexicographic product 


Let G be a graph of order n, and let H = {M1, H2,...,Hn} be an or- 
dered family composed by n graphs. The lexicographic product of G and 
H is the graph Go H, such that V(GoH) = Uneviay({ut x VAG) 
and (uj, Ur)(uj,vs) € E(GoH) if and only if uju; € E(G) or i = j and 
Ups € E( Hj). 


Q LY O 


Oo O 


Fig. 1: The lexicographic product graphs P3 0 {P,,K2,P3} and Py o 
{M, Ho, H3, Hs}, where Hi = Hy = ky and H2 = H3 = Ko. 


If G is a connected graph and (u;,b) and (u;,d) are vertices of GoH, then 


dg 307) if 7 # Jj; 
dgou ((ui, b), (uj, d)) = 
diz,2(0, d), if i = 7. 


Where dy,,2(b,d) = min{dy(b, d),2} is the two distance in the graph Hj. 
For each H; € H, (Hi, dy,,2) is a metric space and makes sense the question 
of the (local) metric dimension in this space. This parameter is called (local) 
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adjacency dimension of H; and it is denoted by adim(adim;)(H;). The adja- 
cency dimension was introduced by Jannesari and Omoomi in [8] as a tool 
to study the metric dimension of lexicographic product graphs. The study of 
the local version of the adjacency dimension was introduced by Fernau and 
Rodriguez- Velazquez in [7] where they calculate the meric dimension and the 
local metric dimension of the corona product of graphs.Also in this paper 
they prove that the problem of computing the (local) adjacency dimension is 
NP-hard. 


Theorem 2. Let G be a connected graph of order n > 2, let {U1, U2,...,Ux} 
be the set of non-singleton true twin equivalence classes of G and let H = 
{M,..., Hn} be a family of graphs. Then 
n 
dimj(GoH) = 5 ¢adim(Hi)+ S > (\ENUj| —1) + o(G,H). 
i=1 INU; 40 
The parameter 0(G,H) is well defined and we have worked on conditions 
to it be equal zero, for example in the case N, ¢ H. 


Fig. 2: The graph Go H, where G is the right-hand graph shown in Figure 
1 and H is the family composed by the graphs H; = Hg = No, Ho = Pa, 
As =~ Hy ~ Hs = Ko. The black vertices correspond to the local adjacency 
basis, the light grey to the vertex cover of the twins graph and the dark-grey 
vertex stands for the og parameter. The set of black- and grey-coloured vertices 
is a local metric basis of Go H. 
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Abstract. In this article we recall the importance of mobility data analysis for the 
proper prediction of human behaviours. Also, we emphasise its multiple applications 
to location-based services, recommendation systems, route planners, and so on. 

One of the key goals of mobility analysis is to predict regions of interest and the 
next interest point in which a given person could be found. We summarise the initial 
steps that we have taken towards proposing a novel method to achieve these goals. 


1 Introduction 


Human behaviour is very complex and diverse. Mobility, as a component 
of human behaviour, is also complex, but its variability is lower and could 
be studied with more focused approaches. In most cases, human mobility is 
analysed with the goal of predicting future behaviours. 

Monitoring people’s mobility during their daily activities is a basic require- 
ment to provide advanced location-based services (LBS)|{1,2,3]. In this sense, 
mobility data play a key role in the analysis of people behaviours, including 
predicting their next location. Fortunately, due to the rapid enhancement of 
data collection abilities of mobile devices, we can easily collect large amounts 
of people’s mobility data at a very low cost. 

There is a variety of devices to collect mobility data. Smartphones are con- 
sidered one of the most appropriate options for tracking and recording user 
mobility during daily activities due to their proximity to the users and their 
ability to carry multiple sensors. The widespread usage of smartphones to- 
gether with the development of location-based applications and services have 
received considerable interest, and special attention is paid towards build- 
ing efficient methods for analysing and predicting important locations, where 
smartphone users will be next. These methods aim to improve both end- 
user applications, such as healthcare applications [4], recommendation sys- 
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Fig. 1: General architecture for next locations prediction. 


tems [5], route planning, carpooling, meeting planners or location-based ad- 
vertisements, and also to help the corresponding institutions to solve issues re- 
lated to network management, healthcare, human computer interaction, socio- 
economic modelling for urban planning, public transportation planning, public 
safety assurance, etc [6]. 

Several research papers discussed the problem of discovering interest points 
and regions of interest and predicting people’s next locations based on GPS 
trajectory data. Those approaches can be roughly classified into: (i) proba- 
bilistic models, such as Markov model and (ii) supervised learning models, 
such as Association Rules, Support Vector Machine and Neural Networks. 

Next, we describe our initial steps into proposing novel methods to predict 
regions of interest and interest points based on mobility data. 


2 General Architecture 


Figure 1 shows the main steps involved in the next locations prediction 
approach that we are currently studying. In the first step, a pre-processing of 
the dataset is performed to remove possible noise from the data and then to 
discover the interest points located in the user movement region. The second 
step corresponds to the building of a prediction model. In the last step, the 
prediction model is evaluated with testing data. 


2.1 Discovering the Regions with Interest Points 


In order to build an accurate prediction model, the regions with interest 
points of the trajectory, that properly describe the movement of the user in the 
region, must be identified. Many popular algorithms for discovering significant 
places depend on the extraction of consecutive GPS points from trajectories 
that satisfy some given threshold conditions (e.g. stay time, distance, etc). If 
consecutive GPS points satisfy those conditions, it is assumed that a signif- 
icant place has been discovered (cf. Figure 2). It could be observed that, in 
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(a) User trajectory. (b) The discovered interest points. 


Fig. 2: Discovering the interest points. 


general, the density of GPS points in the region of interest points is higher 
than in others because people tend to either move slowly or don’t move at all 
in those regions. 

The algorithm that we are currently studying for discovering interest points 
must be sound and complete. The algorithm to discover interest points is 
sound if it only finds interest points, and complete if it finds all interest points. 
These two properties are used as a metric to evaluate the quality of the algo- 
rithm. 


2.2 Model building 


To predict people’s future locations, learning techniques like Markov Mod- 
els, Association rules, Bayesian Networks or Neural Networks are obvious 
candidates to be applied. One of the challenges that are faced by researchers 
while predicting people movements is how to transfer (adapt) these techniques 
to work with the context information of the movements. Building an accurate 
prediction model for all users is hard or even impossible because the next lo- 
cation prediction is a user specific problem. Even if the visited locations might 
overlap among different users the trajectory of user visits different location is 
most likely unique. Thus, building one prediction model for each user could be 
desirable. Usually, building the model, i.e. discovering the frequent trajecto- 
ries and location, is performed off-line while the prediction itself is performed 
on-line. 

We propose to apply Markov Chains to address this prediction problem. 
The Markov Chain (MC) model is a technique that naturally finds its utility in 
movement prediction. To build a MC prediction model, the transitions matrix 
between the interest points region are computed. The rows of the matrix 
represent the last visited interest point region while the columns represent 
the next interest point region. If the user never moves between two interest 
points, that transition value is set to zero. 
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We extend the common MC model by including the notion of time. The 
transitions and time matrices between the interest points region are computed. 
Our time matrix represents the different movement time between locations. 
We proposed to integrate the transition and time matrices into one matrix, 
called Tran-Time matrix. The proposed model essentially contains the spatio- 
temporal properties that have emerged from the GPS coordinates and the 
associated times. 


3 Conclusion 


This brief article presents a general framework for the prediction of the 
next locations of users. Discovering the interest points in the user movement 
area and then predicting the future context information of people movements 
are the two main steps in the prediction model. Soundness and completeness 
are used as metrics to evaluate the proposed algorithms for discovering interest 
points. We have shown that the MC model can be extended to include the 
time associated with the GPS points. 

We are currently investigating the application of our model to real data 
and we expect to obtain preliminary results in the next few months. 


Acknowledgement. The author is a Marti-Franqués grant-holder at the Universitat 
Rovira i Virgili, Tarragona - Spain. 
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Abstract. Underwater sensor networks are an important field of research. Several 
applications require the use of this kind of networks, like tsunami or oil spill alerts. 
The underwater medium is very harsh and only acoustic signals can be used for trans- 
mitting information. This kind of networks is still in development, far from reaching 
standard consensus on basic aspects like carrier frequency or modulation techniques. 
The use of these networks for real-time applications has not been analyzed previously. 
This paper summarizes [1], where we present two solutions for the scheduling of real- 
time messages and we provide a time constraint analysis of the network performance. 


Keywords: Underwater Sensor Networks, Acoustic Sensor Networks, Environmen- 
tal Monitoring 


1 Introduction 


Underwater acoustic wireless sensor networks (WSN) are becoming a hot 
research topic as they have turned into the primary tool to monitor and act 
upon the well-being of marine environments. Radio frequency electromagnetic 
signals do not propagate well underwater. Huge amount of power is required 
to transmit messages even for short distances. The presence of particles and 
moving obstacles like fish, prevents the use of optical carriers. For underwater 
transmissions, the best option are acoustic carriers. While WSN based on RF 
transmissions have been studied and several protocols have been proposed, the 
solutions achieved for them are not useful for acoustic underwater networks, 
since propagation delay is usually larger than transmission time. A message 
may be received well after its transmission has finished in the source node. 

Real-time (RT) communications require not only that messages are trans- 
mitted properly, but also before a particular instant named deadline. If the 
deadline is missed, the message is not valid and may have serious conse- 
quences. A feasible RT schedule is one in which all messages comply with 
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their deadlines. RT message scheduling in multi-hop networks is a complex 
problem that requires the use of routing and queueing techniques. If all the 
nodes in the network have a direct link to the rest of the nodes, the problem 
may be solved using an integer linear programming approach. However, when 
a message should go through intermediate nodes, it is not only a question of 
when a node should transmit (MAC problem) but also of selecting the ap- 
propriate path. In this case, the shortest path is not always the best one, as 
a per-node scheduling should be incorporated in the analysis. In fact, a node 
holding more than one message has to schedule their transmission introducing 
additional delays. 

In this paper we extend the proposed algorithm presented in [2] to include 
RT constraints and message transfers between any pair of nodes in the system. 
A TDMA (Time Division Multiple Access) access protocol is proposed with an 
off-line allocation and scheduling algorithm. Feasibility conditions are given 
for the system to operate with hard RT constraints. 


2 System Model 


For the sake of simplicity, we assume the propagation delay between two 
nodes within transmission range is equal in both directions. Any node may 
transmit a message to any other node in the network if there is a valid path 
between both of them. We denote a message from node a to node b as map. 
We also assume all messages require one time slot to be transmitted and 
that they are sent periodically. Additionally, all messages should be received 
before the associated deadline. P,y and Dap represent the period and deadline 
respectively. In general we define Z = {mij (Pij, Dij)}- 

The network can be modeled as a directed graph G = (V, E), in which V 
is the set of nodes in the network and E the set of edges. If two nodes u and 
v are within transmission range, there is an edge connecting them, e = (u,v). 
Each edge has a label that represents the transmission delay between the 
nodes measured in time slots, T,,. As collisions are important only if they are 
produced at the node, there are four different scenarios, as stated in [3]. 

We propose a slot allocation method to order the access of the nodes to 
the channel in such a way that each message originated in a node may reach 
its destination node without collisions. We begin considering that destination 
nodes are within transmission range of source/transmission node, and later 
we extend the analysis for nodes at larger distances. Stated in this way, the 
slot assignment problem is an extension of the graph coloring problem. We 
present an integer linear programming (ILP) model, to minimize the frame 
length measured in slots. The model is significantly more complex if a per 
message slot allocation is performed. Further details can be found in [3]. 
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3 Scheduling 


Path discovery is a well known problem in networking. Several algorithms 
have been proposed to compute the best path for a message to reach des- 
tination from a source. The most common solutions are based on Dijsktra 
algorithm to determine the shortest path from any node in the network to 
any other node (SPF, shortest path first). In the case of communication net- 
works, the cost associated to the edges may be related to the actual delay 
between the nodes, an economical cost for using that link (paying service to a 
third party company) or the power required to use the link. For real-time mes- 
sages, the total delay in the path should be less or equal to the deadline of the 
message. If this condition is not guaranteed, the message is not schedulable 
and the network does not comply the real-time requirements. 


4 Heuristic approach 


In the proposed model, the variables that affect the communication speed 
and therefore the timing of the system are the frame duration, the order of 
transmissions and reception of messages, and the routes that each message 
follows within the network. In section 5 of [1] an heuristic algorithm is pre- 
sented to optimize the message/slot allocation to minimize the frame size and 
guarantee the deadlines. The minimum length frame is not necessarily the 
optimal to meet the system time requirements and this impedes uncoupling 
the calculation of the frame with respect to the calculation of routes. The 
heuristic presented generates a fixed length frame and optimize the paths of 
the messages to meet all system deadlines. This heuristic is better suited for 
complex problems where exists multiple paths for different messages. 


5 Real-time applications 


Tsunamis are generated by earthquakes in the ocean, and can be tragic 
like the ones in Japan 2011 or Indonesia 2004. While in the case of Japan, 
the number of casualties associated to the tsunami is relatively low, in the 
case of Indonesia, the number of victims is counted in thousands (and severe 
economic loses in infrastructure). The difference in the number of victims is 
associated to the early alert that people in Japan received to get into a safe 
place. 

Detecting a tsunami is a hard work. Seismic sensors may be deployed in 
the area in which the earthquake may take place (geologic fault) and if this is 
detected, depending on the intensity a tsunami alert may be issued. The time 
available between the earthquake and the arrival of the wave to the beach 
depends on the distance to the earthquake epicenter. However, it is clear that 
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there is a hard RT restriction as the alert should be issued with enough time 
for people to move into a safe place. 

The system may have some buoys anchored along the fault and linked 
to the seismic sensors so once the earthquake is detected, the buoy connects 
through a satellite network to a management disaster office reporting the 
event, intensity and tsunami probability. However, buoys are vandalized by 
pirates or even fishermen jeopardizing the network operation. To avoid this, 
an underwater acoustic WSN is proposed operating in RT. The network de- 
ployment, nodes distribution and number of hops discussion is out of the scope 
of this paper. However, the RT analysis and network performance modeling 
proposed here can be used to set-up the appropriate network. 


6 Conclusions and Future Work 


We presented a RT analysis for an underwater acoustic WSN, with two 
approaches. First, the network is analyzed with integer linear programming 
techniques. SPF is used as routing policy combined with a message or node 
slot allocation procedure in a TDMA frame. We presented the schedulability 
condition for the case in which messages are transmitted following a FIFO 
policy. This scheduling discipline is quite simple and requires little processing 
within the underwater nodes. However, better results may be obtained if some 
RT priority policies are implemented (as future work). The second solution 
is based on a heuristic approach. Messages are scheduled following a per- 
link approach and finding the route with lower delay. This solution improves 
the 2-step approach of finding the SPF first, for allocating the slots within 
the frame later. As the heuristic only considers the messages actually being 
transmitted, unnecessary restrictions are avoided. We also presented a real 
application (tsunami early alert) in which RT transmissions are necessary. 


Acknowledgement. This work has been partially supported by the Spanish MCI and 
FEDER funds of the EU under contracts TIN2013-44375-R, TIN2013-47245-C2-1-R, 
TIN2013-47245-C2-2-R, and also the Community Networks Testbed for the Future 
Internet (CONFINE) project: FP7-288535. 
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Abstract. In some applications, semantic information has an important role in the 
decision process. This is the case of recommender systems in Tourism [2]. When a 
tourist has to decide her destination, textual characteristics (i.e. tags) of the different 
places are key elements to be taken into account (e.g. types of activities to do, main 
landmarks, etc.). The appropariate analysis of this kind of information is crucial in 
the development of a new generation of semantic recommender systems. This thesis 
has the goal to develop new decision aiding tools that incorporate semantic criteria 
together with numerical ones. Semantic criteria are categorical multi-valued variables 
whose values are tags (i.e. terms or words) that can be interpreted at a conceptual 
level. Ontologies are knowledge representation structures that enable this semantic 
interpretation by exploring the relationships between the terms found in the semantic 
variable. Furthermore, they may also be used to store the user’s preferences. The 
evaluation of the new semantic knowledge management methods proposed in this 
thesis will be done in collaboration with the Scientific and Technological Research 
Park in Tourism and Leisure (Vila-Seca, Tarragona). 


1 Introduction 


Multi-criteria decision aiding (MCDA) is a well-established discipline fo- 
cused on proposing decision support tools for the case of dealing with multiple 
and conflicting criteria [3]. The decision problem may be defined as a choice, 
ranking or sorting of a set of alternatives, based on their performance on a 
set of criteria. Although the classic methods were based on operational re- 
search and economic theories, nowadays they integrate techniques from other 
fields, specially from Artificial Intelligence. There are three main methodolog- 
ical approaches in MCDA: utility theory, outranking methods and rule-based 
systems. 

This thesis is focused on the outranking method ELECTRE-III, which is 
based on the construction of a pairwise outranking matrix that represents 
the preference structure among a set of alternatives. Once the outranking 
matrix has been calculated, different exploitation procedures allow to make a 
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choice (i.e. select the best alternatives), to rank all alternatives or to sort the 
alternatives into some predefined and ordered classes. 

Up to now outranking methods have considered mainly numerical and 
ordinal scales in the set of criteria. In this PhD thesis I study the use of cri- 
teria built upon semantic variables, including additional domain knowledge 
by means of a background domain ontology. The ontology is a knowledge 
representation structure that enables an exploration of semantic relationships 
between the tags found in the semantic variable. We enlarge the ontology 
base data with information about the user’s preferences on the tags. Figure 
1 illustrates an example of a recommender system for tourists visiting the 
province of Tarragona. The alternatives under consideration are a set of ac- 
tivities that can be done in this territory. Each activity is described by a 
numerical criterion (the cost of the activity) and two multi-valued semantic 
ones (a description of the categories in which the activity may be classified 
and a list of the most adequate types of weather to perform the activity). 
Depending on the interests (i.e. preferences, goals) of the tourist, the best 
alternative will be a different one. For a family interested on cultural events 
coming on a rainy day, the archeological museum should be the most suitable 
option, whereas for an sportive young guy with no money restrictions, some 
extreme sports on the Montsant mountain could be recommended. Notice that 
each semantic criterion can use a different domain ontology as support for the 
analysis of the tags. 


re Activityname Touristic description Cost Best weather 
@ Montsant Paragliding, ClimbingWall, a0€ NoPrecipitation, 
@ Mountain Rappelling PartlyCloudy 
@22 eeeee icni i 
S23 cs cs Tarragona BeachPicnic, FamilyBeaches, 40€ HighSun, LightAir 
e @ Beach Sunbathing, Boating 
(YY mo ; oh ; 
@@ e $8 Archeological ee fae ee one LightPrecipitation, 
cece @ °° Museum asks perupne cea. NeutralState, OverCast 
eeccee HistoricBuilding 
Adventure HorseRiding, Car4x4, PaintBall, 60€ ModerateSun, 
andJourney  ShoppingArea OptimumHumidity 
t t 
Semantic Criteria Semantic Criteria 


Fig. 1: Example of a data matrix with semantic and numerical criteria 


2 Including semantic criteria in the ELECTRE method 


The first task of the thesis has been the redefinition of the procedure for 
constructing a valued outranking relation in ELECTRE from semantic multi- 
valued variables. For each alternative, a semantic variable may have a list of 
tags (i.e. terms). The concepts of a background ontology correspond to these 
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tags in order to be able to compare them semantically, using some appropriate 
ontology-based semantic similarity measure. 

ELECTRE is a well-known decision aiding method that constructs a valued 
outranking relation using pseudo-criteria using some discrimination thresholds 
to manage the uncertainty on the data [3]. For a pair of alternatives, a and 
b, the procedure for calculating the credibility of the outranking relation aSb 
is based on two indices inspired on social voting mechanisms: concordance 
(i.e. overall majority support to aSb) and discordance (i.e. respect to minor- 
ity opinions against aSb). For numerical and ordinal criteria, the strength 
of the partial concordance or discordance about aSb for a certain criterion 
gj is obtained by comparing the performance of both alternatives g;(a) and 
gj(b). The partial concordance index takes into account two discrimination 
thresholds, q; (indifference) and p; (preference), while the discordance index 
has a veto v; threshold to determine the degree of opposition to the assertion 
aSb. We have proposed a redefinition of the partial concordance and discor- 
dance indices to tackle the case of semantic criteria. First, we defined the Tag 
Interest Score TIS(c) of a concept, which is a numerical value from 0 to 1 
that represents the suitability of the concept c for a certain user. Second, the 
Semantic Win Rate SW R,(a, b) is defined as a numerical value that indicates 
the degree of preference of alternative a with respect to b on the semantic 
criterion g;. It is based on the evaluation of the user’s preferences (from the 
ontology) about the two sets of tags g;(a) = {t1a, t2,a,t3,a, + b1g,(a)| ah and 
OGD) = bi5 49; 0; bap tig;(b)|,bs- As it is a rate that represents the compar- 
ison of the performance of a over b, the ELECTRE-III thresholds are now 
defined as follows: 


e yj; is the minimum value for the strength of SWR,(a,b) to consider a 
maximum concordance with aSb. 

e p; indicates the maximum difference between SWR;(a,b) and pj; that 
still shows some preference of a with regards to b, thus still supporting the 
relation aSb to a certain degree. 

e vu; is the veto threshold, which shows the minimum negative difference 
between SWR,(a,b) and 1; that requires the full discordance with the 
outranking relation. 


3 Constructing a semantic user profile 


User profiling is required in many decision support systems. Nowadays it 
is becoming more common to find decision problems involving non-numerical 
data, such as multi-valued semantic criteria, which take as values the con- 
cepts of a given domain ontology. Different models of representation of the 
preferences have been revised [1]. Upon that, we propose to create a semantic 
user profile by storing preference scores into the ontology. This preferential 
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information can be later exploited to rank and recommend the most suit- 
able alternatives for each user with the ELECTRE method (or other decision 
aiding methods). 

As said before, ontologies store domain information in the form of concepts 
and taxonomic relations. In this work we propose to include a numerical in- 
terest score attached to the most specific concepts (i.e. the leaves of the tax- 
onomy). With this score, which is associated to very detailed concepts, we are 
able to distinguish better the preferences of the user, improving the quality 
of the decision. Provided that ontologies usually have hundreds of concepts, 
it is not feasible to obtain all their scores at the beginning. Therefore, given 
a concept c with an unknown preference, an inference procedure has been 
designed to estimate it. The basic idea is to find a subset of concepts seman- 
tically similar to c and to aggregate their scores. After studying the literature 
on aggregation operators [6], we propose using the WOWA (Weighted Ordered 
Weighted Average) operator with two weighting factors: OWA weights define 
the pessimistic/optimistic aggregation policy, while criteria weights give dif- 
ferent importance to the aggregated values in terms of their semantic distance 
to c [5]. We are currently studying which are the most appropriate parame- 
ters of WOWA depending on the structure of the ontology and the number of 
missing tag interest scores. 
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1 Introduction 


Nowadays, the population shift from rural to urban areas poses severe chal- 
lenges to cities. In big urban areas, factors related to economies of scale help 
to reduce operational costs. However, managing such cities is challenging due 
to the large number of inhabitants and their needs. Thus, such management 
procedures have to be adapted to a growing and very demanding population. 
The citizens’ quality of life is one of the most relevant aspects in this sce- 
nario. Therefore, healthcare services are also taken into account, due to their 
relevance and the potentially high cost inherent in service provision, which is 
constantly increasing due to the population growth and the increase in life 
expectancy. Smart health (s-health), which is understood as a natural evolu- 
tion of e-health in the context of smart cities, was introduced by Solanas et 
al., and can be defined as follows: 


“Smart health (s-health) is the provision of health services by using 
the context-aware network and sensing infrastructure of smart cities.” 
Solanas et al. [13] 


However, such new health paradigm has evolved to transcend the boundaries 
of smart cities and to be fully applicable to any context-aware environment 
with the aim to improve the quality of life of people. Noticeably, s-health is a 
subclass of e-health because it is founded in ICT like mobile health (m-health). 
However, it differs from m-health in that the underlying infrastructure is not 
necessary mobile and in most cases it is static. With the aim to provide a 
long-term sustainable healthcare system, optimized organization and man- 
agement systems are combined in novel healthcare service implementations. 
In this way, new models based on e-health, m-health, s-health and Ambient 
Assisted Living, in which mobile communication systems, distributed sensor 
networks and optimized middleware have been proposed and are currently 
being deployed following different strategies (i.e. in conjunction with telecom 
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operators, hybrid mobile WLAN-WSN implementation, etc.). Therefore, one 
of the main enablers of context aware environments is the use of different 
wireless communication systems, which provide seamless connectivity to a 
potentially very large set of transceivers embedded in mobile and wearable 
terminals, WLAN hot spots or dense wireless sensor network deployments. 


2 Wireless Channel Characterization 


In the last decades, the use of wireless systems has increased substantially, 
given the popularity of mobile networks, wireless LAN and wireless sensor 
networks. The advent of context-aware environments, mainly driven by the 
trend in smart city/smart region development, is going to increase furthermore 
the deployment of 4G mobile networks, IoT and the overall evolution towards 
high capacity and capillarity of 5G systems. In this scenario, one of the main 
considerations is to control interference precisely, in order to increase cover- 
age/capacity ratios. In this sense, given the wide variety of existent wireless 
systems and the inherent complexity of large, dense urban scenarios, radio- 
planning tasks are compulsory to fully account for useful server signals as well 
as intra-system and inter-system interference sources. In order to accomplish 
that, several techniques can be used, from semi-empirical regressive methods, 
which exhibit large errors and measurement dependent models, to determinis- 
tic based techniques like full wave electromagnetic simulation. As a midpoint 
between precision and computational cost, deterministic Ray Launching (RL) 
methods offer a good trade-off between precision and computational cost. 
However, performing real measurements is very time consuming and becomes 
impractical in complex, large scenarios [1]. With the aim to avoid this burden, 
simulation techniques based on Ray Tracing, combining Geometric Optics and 
Uniform Theory of Diffraction, are used to predict waves’ behaviour within 
a given environment. Those simulations depend on a number of parameters, 
namely angular resolution, number of rebounds, cells size, etc. By tuning 
these parameters, high-definition (HD) and low-definition (LD) results can be 
obtained. Although more practical than manual measurements, the computa- 
tional cost of simulations in HD prevents their use in complex environments 
and their LD counterparts are applied. 


3 Recommender Systems and Collaborative Filtering 


Recommender systems [11] play an active role in the Internet through 
the advances in data mining and artificial intelligence. Collaborative Filtering 
(CF) [8] is a kind of recommender system that comprises a large family of 
recommendation methods. The aim of CF is to make suggestions on a set 
of items I (e.g. restaurants, films or routes) based on the preferences of a 
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set of users U that have already acquired and/or rated some of those items. 
Recommendations provided by CF methods are based on the premise that 
similar users are interested in similar items (i.e they share similar patterns). 
Therefore, items which pleased user ug could be recommended to user wp, if 
Uq and uy are similar. In order to predict whether an item would interest a 
given user, CF methods rely on a matrix M of n users (rows) and m items 
(columns), where each matrix cell Mj,; stores the rate of user 7 on item j. The 
interested reader could refer to [5,6,7,12] for a detailed CF’s state-of-the-art. 


4 Hybrid method for Wireless Channel Characterization 


One of the issues to consider in the design of communication networks 
in the context of s-health scenarios is their performance in terms of cover- 
age/capacity ratios [4,9], with particular consideration of the impact of in- 
terference due to simultaneous use of multiple users and systems [10]. It is in 
this case where careful radiofrequency signal analysis, in terms of useful signal 
transmission and existence of potential interference levels must be estimated, 
as a function of user density, transceiver type and location. As previously 
stated, wireless signal analysis in large complex scenarios is computationally 
costly and requires the use of optimized deterministic techniques. In the case 
of very large scenarios, such as cities, this approach can still be computa- 
tionally too demanding and combination with other estimation approaches is 
compulsory [1]. In order to minimize computational cost for certain scenarios, 
we proposed the combination of in-house developed 3D Ray Launching code 
with CF techniques in [2]. The main idea of our hybrid proposal is to use the 
ability of CF methods of predicting rates to find the values of empty cells in LD 
simulations. Therefore, we implemented a hybrid method which follows a two 
step procedure (i.e. neighbourhood search and recommendation/prediction 
computation) in order to estimate the power level of empty/error cells so that 
they are as similar as possible to the values that would have been obtained in 
an HD simulation. Figure 1 shows an example of the outcomes obtained by 
the hybrid approach. Our proposal has been tested in different scenarios such 
as medical emergency rooms [3], in houses with concrete rooms distribution 
[10] and at universities with different kinds of laboratories and offices [2]. The 
outcomes of such experiments showed that our proposal was not only more 
accurate than other state-of-the-art methods, but also more efficient in terms 
of computational cost. Therefore, we may conclude that our proposal is ac- 
curate as well as efficient, and that could be used to enhance the accuracy of 
LD simulations in diverse scenarios. 
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Fig. 1: Received power level estimation when using HD (top), LD+CF (mid- 
dle) and LD (bottom). 


5 Conclusions 


In this paper, we have stated the importance of healthcare systems in 
the context of smart cities. Moreover, we have pointed out the relevance of 
the new health paradigm, smart-health, and the importance of increasing the 
coverage/capacity ratios of wireless systems in context-aware scenarios such 
as dense urban areas or hospitals. We have showed that CF methods can be 
successfully applied to improve the accuracy of LD simulations performed by 
a RL approach and that our proposal is fast and could help to reduce the 
HD simulation costs. Therefore, Collaborative Filtering could be integrated 
with the sensing infrastructure of smart cities to improve the sustainability 
by optimizing the resource usage in the communication networks field. 
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1 Introduction 


As a tool for personal storage, file synchronization and data sharing, cloud 
storage services such as Dropbox, Box, Google Drive, etc., have quickly gained 
popularity in the last few years. This services provide users with reliable data 
storage that can be automatically synced across multiple devices, and also 
shared among a group of users. The main problem with these services is that 
they rely on the client-server communication paradigm to make their content 
available. And depending upon the scenario, it may introduce a huge network 
overhead. 

To minimize network overhead, cloud storage services use a variety of tech- 
niques such as binary diffs on file chunks, file bundling, data compression, 
among others. One of those techniques is sync deferment, which consists in 
aggregating several file mutations in a single message to improve network com- 
munication. In practice, it has been implemented using a fired time threshold 
of T seconds (Dropbox) [1]: once T have elapsed, the client triggers an update 
to the server. The disadvantage of fixed sync deferment is that it is limited 
in terms of usage scenarios. For instance, consider a collaborative document 
editing scenario where the frequency of modifications to a file are huge. Just 
in Dropbox, for 8.5% of its users, sync traffic caused by frequent modifications 
represents more than 10% of their total network traffic [2]. In settings like this, 
a “smarter” sync deferment mechanism is actually necessary to decrease the 
overhead generated by the amount of superfluous data sent by clients over 
time. 

In particular, the authors of the recent measurement on cloud storage ser- 
vices [1] suggest the usage of adaptive sync deferment techniques to overcome 
the limitations found in static techniques. 
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2 Adaptive Sync Deferment (ADS) 


The adaptive sync deferment algorithm introduced in [1] proposes to adap- 
tively tune the sync deferment time T; to follow the latest file update. Simply 
put, when updates happen more frequently, the idea is that T; becomes shorter 
to keep pace with a higher update rate. And when they happen less frequently, 
it gets longer. To achieve this, T; is adapted in the following simple iterative 
manner: 


fat 2 ah 
T; = min ( 9 + 5 
where At; is the inter-update time between the (i—1)-th and the i-th data 
updates, and ¢€ € (0,1.0) is a small constant that guarantees T; to be slightly 
longer than At; in a small number of iteration rounds. Tynaz is also a constant 
representing the upper bound on 7;. Note that a too long 7; will harm user 
experience by introducing unacceptable long sync delays. 


+6 Tar] ; (1) 


3 Our Proposal: Rate-based Sync Deferment (RDS) 


Although the authors of [1] demonstrate that Google Drive and OneDrive 
will receive a negligible overhead close to 1 by using the ADS algorithm, this 
algorithm does not take into account the amount of bytes triggered by each 
update. Data volume is also an important factor. To better understand this, 
consider a regular file update pattern. While ADS will be able to easily adapt 
to the sync deferment time J; to this pattern, it would not be able to keep the 
overhead low if the data volume per update was very variable, for instance, 
by sinusoidally oscillating over time. 

For this reason, we have added the concept of Rate in our algorithm, 
measured in bytes per second, to do the calculations that determine whether 
a batch of updates should be pushed or not to the cloud. Observe that the 
introduction of the notion of Rate provides us with a finer-grained control of 
the network overhead, for we have called our algorithm RDS. 

More specifically, the Rate; is calculated from the amount of bytes since 
the last update AB; and the inter-update time At;. Then, 


AB; 
Rate; = ——. 


Note that AB; is easy to calculate by comparing the file chunks at each update 
time t;. Equipped with this information, T; is adapted in an iterative manner 
as follows: 

T; = main ate.) Viner) (2) 


where Tirax is a constant representing the upper bound of 7; (i.e., upper 
bound on the unsynced time) and Tpate, is computed as follows: 
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Trate, = TRi-1 *a + ATR; * (a—1)+€, 


where TR; = B/Rate;, B is the maximum number of bytes required to meet 
our targeted overhead objective, € is a small constant and a is a weighting 
factor (EWMA). More specifically, a determines the agility of the time esti- 
mator TRate, in following the abrupt changes in the actual data rate or the 
stability of the estimator in ignoring short term variations. 


4 Preliminary results 


To evaluate the effectiveness of RDS, we have conducted several experi- 
ments on both real-life and synthetic workloads. To provide a broader view, 
we have compared the efficacy of RDS with the ADS algorithm. 


Workloads. We have considered the following workloads: 


e UB1. We have used a real log file of user file synchronization mechanism 
during a whole day from Ubuntu One (UB1) platform [3]. Only users with 
relevant activity periods were chosen for the tests. By “activity period” we 
mean the time elapsed between the first data update in the day and the last 
one. Users displaying activity periods of less than 1 hour were considered 
to be irrelevant for our study. Further, activity periods were split into 
sessions. We considered as a session a sequence of file updates where the 
inter-update time At; was lower than 900 seconds. When this condition 
was not satisfied, the current session was considered to be “close”, so the 
next data update received became the first one in the new session. 

We have extracted two sessions: A regular pattern with a coefficient of 
variation (CV) in the data volume per update lower than 1 (Session A), 
and a pattern with high variability, i.e., with the CV > 1 (Session B). 

e Synthetic. As another pattern, we have artificially generated a triangular 
pattern that triggers a file mutation every 5 seconds. More specifically, an 
initial write of 1OKBs is performed, and subsequently, 1OKBs of new data 
is appended at every new update until the file size reaches a size of 80 
KBs. From that point onwards, a decremental update of 10 KBs is done 
until the file size becomes 10KBs again. 

This is repeated over time, mimicking a triangular signal. 


Results. Table 1 reports the results for all the three workloads. The metrics 
for the comparison were: the resulting network overhead, the number of up- 
loads to the cloud servers during a session, and the average sync deferment 
time obtained in practice. 

As shown in Table 1, ADS does not work well in all the cases, as it 
is possible to find real and synthetic workloads where the sync deferment 
time becomes extremely long, impacting user experience very negatively and 


30 Rati Saiz-Laud6 


Overhead over time 


2 


RDS uploads 
O ADS uploads 


jlLé : oe oe # “BODO GBD RO) 
12:00 12:15 12:30 12:45 13:00 13:15 13:30 13:45 


Fig. 1: Session B. Evolution of the network overhead over time. 


Table 1: Comparison between ADS and RDS. 


Algorithm] Overhead|# Uploads/Sync Deferment Time 
Season A RDS 1.0137 123 33.447 seconds 
ADS 1.0456 164 40.650 seconds 
Scene RDS 1.1416 99 13.120 seconds 
ADS 1.1159 36 351.34 seconds 
Synthatte RDS 1.2038 4664 7.9991 seconds 
ADS 1.3513 3 oe) 


increasing the frequency of user conflicts when they concurrently edit a file. 
In contrast, RDS delivers an equivalent network overhead in all the scenarios, 
but it is able to keep a good synchronization level at the same time, making 
RDS more stable and responsive. To better understand this, the instantaneous 
overhead for every update is shown in Fig. 1 for the Session B. As can be 
shown in this figure, RDS waits less before triggering an update to the cloud, 
exhibiting a smoother behavior. 
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Abstract. We tackle the problem of privacy-preserving statistical computation in 
the cloud. The goal is to use the cloud not only to store sensitive data but also 
to perform computations on them. Specifically, we focus on protocols to obtain the 
sample covariance matrix of the sensitive numerical data set, and on protocols to 
obtain the contingency matrix and the distance covariance matrix of the sensitive 
categorical data set, calculations that underlie most statistical analyses. However, 
the multi-cloud is semi-honest, that is, it follows the protocols but is not authorized 
to learn the sensitive data. We rely on the use of several clouds; if these can be 
assumed not to collude, we use vertical data splitting among clouds; if clouds may 
collude, we present two alternative protocols that withstand collusion at the expense 
of increased cloud storage. 


1 Introduction 


Data have become a crucial asset of many enterprises, organizations and 
public administrations. Collecting and analyzing large amounts of data related 
to individuals does not only improve research, but it also drives a tremendous 
business [13]. However, local storage and processing of such big data is often 
unfeasible for the data controllers because of the associated costs (software, 
hardware, energy, maintenance). An attractive possibility for a data controller 
is to outsource data to a cloud [2]. This brings several benefits such as large 
and highly scalable storage/computation resources at a low cost and with 
ubiquitous access. On the other hand, concerns about security and privacy still 
have a detrimental impact on the adoption and acceptance of cloud services: 
neither users, nor companies want the cloud service provider (CSP) to read, 
use or sell their data. 

In this context, the need emerges to find a secure, efficient and privacy- 
preserving storage and processing methods for the (sensitive) data out- 
sourced to the cloud. This is precisely the main goal of the European project 
CLARUS [5] which consists in a proxy located in a domain trusted by the 
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data controller (e.g., a server in her company’s intranet or a plug-in in her 
device) that implements security and privacy-enabling features towards the 
CSP so that i) the CSP only receives privacy-protected versions of the con- 
troller’s data, ii) CLARUS makes the access to such data transparent to the 
controller’s users (by adapting their queries and reconstructing the results re- 
trieved from the cloud) and iii) it remains possible for the users to leverage 
the cloud to perform accurate computations on the outsourced data without 
downloading them. 

To do so, CLARUS particularly relies on data splitting as a data protection 
technique: data are partitioned into several fragments, each of which is stored 
in the clear in a cloud provided by a different CSP [4]. Data splitting is an 
alternative that is more efficient and functionality-preserving than encryption- 
based methods (e.g., CipherCloud, PerspecSys, SecureCloud, etc.). In general, 
even though searchable and homomorphic encryption allow performing some 
operations on ciphertext [7], computing on encrypted data is extremely limited 
and costly [9], and it requires careful management of encryption keys. In 
contrast, the vertical data splitting implemented by CLARUS protects privacy 
(confidential information on an individual is partitioned into fragments that 
cannot be linked) and allows computation to be performed on clear data. 


2 Obtained Results 


In [3], we evaluated several non-cryptographic proposals for statistical com- 
putation (basically correlations) on split data, and we enhanced and proposed 
some protocols adapted to the CLARUS scenario. In [10], we extended these 
results by considering also cryptographic protocols and by relaxing the non- 
collusion assumption. We first assumed that the CSPs do not collude to recon- 
struct the original data from the fragments, and we presented two protocols 
for this setting. We then relaxed the non-collusion assumption and present 
two protocols that are collusion-resistant, even though they require substan- 
tial cloud storage (because they rely on data replication rather than splitting). 
In both the articles, we focused on the computation of the sample covariance 
matrix because many statistical analyses, such as regression, classification, 
principal component analysis, etc., are based on it. 

All these protocols and methods were designed for numerical data. How- 
ever, many of the (personal) data currently collected from a variety of sources 
(social networks, surveys, B2C transactions, etc.) are not numerical. In [11], we 
adapted some of the methods proposed for split numerical data to categorical 
data. Specifically, we described two protocols (with and without cryptogra- 
phy, respectively) to compute the contingency table (used for the x?-test of 
independence [1]) and the distance covariance matrix (based on the more re- 
cent distance covariance/correlation measure, see [12]) needed to measure 
the correlation between two categorical attributes stored in different clouds. 
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In all the articles, we compared the computational and communication 
costs of the described protocols against a benchmark consisting of the CLARUS 
proxy downloading the entire data set and locally computing on the down- 
loaded data set. If clouds can be assumed not to collude, data splitting is 
probably the best choice, due to simplicity and flexibility. 
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Fig. 1: Protocols for the secure scalar product. The top charts work in the 
non-colluding scenario and bottom charts in the collusion-resistant scenario. 


3 Computation on vertically partitioned data 


In vertical splitting, analyses that involve only attributes in a single frag- 
ment are really fast and easy to compute: the cloud storing the fragment can 
compute and send the output of the analysis to the CLARUS proxy. Unfor- 
tunately, sample covariance matrix, the contingency matrix and the distance 
covariance matrix involve attributes stored in different fragments, and thus 
communication between clouds. Obtaining the sample covariance matrix (the 
contingency matrix and the distance covariance matrix, respectively) in ver- 
tical splitting among several clouds can be decomposed into several secure 
scalar products to be conducted between pairs of clouds (see [10] and [11]). 
Secure scalar products can be based on cryptography (the protocol in [8] in- 
volves homomorphic encryption), or not ({6], modify the data before sharing 
them in such a way that the original data cannot be deduced from the shared 
data but the final results are preserved). See Figure 1 for a sketch of the secure 
scalar product protocols. 
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1 Introduction 


The search of a lost target in a spatial domain is a physical process that 
appears in many different contexts [1,3]. Search Theory is a broad field in 
applied mathematics that delves into the study of search strategies, either from 
the analytical derivation of optimal trajectories [6] or the study of performance 
and behavior of the strategies existing in nature [1,2,3]. 

The current work explores the search problem from a numerical approach. 
The construction of a Bayesian Search Algorithm with an Information Gain 
Maximization Criteria defines an adequate framework to analyze the trajec- 
tory and performance of several strategies and to identify the different devia- 
tions from optimal plans [2]. We relate the information-processing mechanism 
to the bounded rationality of the agent [5]: the searcher, as in real situations, 
shows cognitive biases when dealing with incomplete information problems. 


2 The Bayesian Model 


The search problem is treated as an informative issue. The uncertainity 
of the system is captured in a probability distribution function P(x, y) for 
the target allocation in a given domain. The agent conducts the search by 
constantly updating the probabilistic map after unsuccessful local searches, 
applying a Bayesian inference as new evidences are acquired [1]: 


j-l gol 
P(a;,4;) = [| Pra, ¥s)P(o, yo) = [[ [1 -— Polai.)|P(o, yo) (1) 
i=0 i=0 


where P(x;,y;) measures the current probability after j iterations, and 
Pp(2i, yi) is the detection probability which exponentially decreases with the 
distance between the target and the agent’s current position. 
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A movement criteria is required to choose the next search regions. We 
impose the optimal search policy derived by E.T Jaynes in the context of 
Information Theory [4]. It states that the searcher should maximize the in- 
formation gain about the target, being able to exploit the prior knowledge as 
fast as possible to obtain the maximum saving in search effort. 


cells cells 


AS = Sy — $8, = — > Poln(Po) + S> Piln(P,) (2) 


In the algorithm, the agent samples 5 new positions, and it moves towards 
the one that maximizes the difference of Shannon’s Entropy AS between the 
current and the future state of the search [1,4]. 
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Fig. 1: Difference between information gain maximization and random strate- 
gies. 


At the end, the model attempts to describe the real cognitive process during 
the search. The agent searches in the most probable region, modifies the prior 
beliefs and thinks where to move next until the target is found [3]. 


2.1 Parametrization and computational costs 


In this model, all the parameters involved in the search can be quantified 
and classified into dimensional (scale of the problem), strategical (jump width 
and sampling constraints) and computational (number of cells, samples and 
iterations). The computational costs of the algorithm can be related to the 
search time in a real process. When the total time Q is fixed, a trade-off 
appears [2]: the information-processing mechanism requires some effort to 
sample the S new positions in detriment of the number of iterations J. We 
found an analytical expression: 
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that relates the effort Q to the computational parameters and shows that 
an agent’s strategy requires an specific effort allocation to conduct the search. 


3 The Numerical Study 


3.1 State of Knowledge and Effort Allocation 


We attempt to reproduce the behavior of a real searcher that is expected to 
show biases from optimal plans. The deviations might come from the State of 
Knowledge K (choice of strategical parameters and movement criteria) and the 
Effort Allocation D (amount of information-processing in terms of sampling). 
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Fig. 2: Deviations from optimal strategies in the plane (K,S). 


where K = 0 is a reference point for the strategies with no available infor- 
mation (Random Search), and K # 0 for the strategies that use some (valuable 
or not) information. The numerical study seeks to identify the sources of er- 
ror coming from bad estimations on the search parameters (K < 0), or an 
inadequate allocation of the effort (risk-averse and risk-loving behaviors). 


3.2 Trajectory and Performance Analysis 


We analyze numerically the impact of several search parameters under 
different initial distributions and sampling strategies. The parametric anal- 
ysis allows us to find the optimal search values and detect the different 
sources of error in the path-planning [2]. The results show that an appro- 
priate parametrization can increase the performance of the strategies guided 
by a prior expectation when some useful information of the system is available. 
However, incorrect estimations on the initial distribution or search parameters 
can lead to a very bad performance, even surpassed by pure random strategies, 
which do not follow any movement criteria or sampling method. 
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3.3 A Search Game for Four Players 


We build up four players with different States of Knowledge (from the 
Pro Player with optimal parameter values to a completely Random Player) 
in order to understand better the cognitive biases in a search strategy. 

Three scenarios are defined, where a large number of targets are hidden in 
consecutive searches with a fixed time or effort Q, and the performance of the 
players is measured for different effort allocations (i.e number of samples S$). 
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Fig. 3: Players performance for different effort allocations in the Gaussian 
scenario. 


The bayesian model applied in these games validates the hypothesis of the 
work and shows that each strategy has an specific optimal Effort Allocation 
Point, which might change depending on the State of Knowledge of the agent 
and the uncertainty of the system. We have empirically shown the trade-off 
between information-processing and randomness in a search strategy, where 
the rationality of the agent is bounded by its theoretical knowledge on the 
problem and the global effort constraints [2,5]. 


References 


[1] C. Barbieri, S. Cocco, R. Monasson. On the trajectories and performance of 
Infotaxis. Physics Bio. EPL, 94 - 20005, 2011. 


[2] D. Campos, V. Mendez, J. Palmer and F. Bartumeus. Path planning in the 
light of random search theory: coping with human errors and uncertainty. Not 
published yet. 2015. 


[3] A. Calhoun, S. Chalasani, T. Sharpee. Maximally informative foraging by 
Caenorhabditis elegans. eLIFE 3. e04220, 2014. 


A Numerical Study on Bayesian Search Strategies 43 


[4] E.T. Jaynes. Entropy and Search Theory. First Maximum Entropy Workshop, 
University of Wyoming. 1981. 


[5] H.A. Simon. Rational choice and the structure of the environment, Physcal Re- 
view. V. 63(2), 129-38, 1956. 


[6] L.D. Stone. Theory of optimal search. Operations Research Society of America, 
Arlington, Virginia. ORSA Books, 1989. 


Clinical Decision Support System for Diabetic 
Retinopathy Risk Evaluation 


Emran Saleh * 


Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili 
Tarragona, Spain 
emran.saleh@estudiants.urv.cat 


1 Introduction 


Diabetic retinopathy (DR) is the main reason of a cumulative demolition 
of the retina for diabetic patients, being the essential cause of the vision loss 
among working-age adults. As diabetes prevalence grows, it does also the 
number of people suffering DR, being a main concern for health care centres. 
Frequent and early checking of eye fundus using non-mydriatic fundus cameras 
may minimize the risk of blindness development and the economic impact of 
the remedy as well [2]. Unfortunately, because of the large number of diabetic 
patients, it is not feasible to make a preventive screening to all the patients. 

Physicians of the Ophthalmology Department of Sant Joan de Reus Uni- 
versity Hospital (SJRU) did a statistical and clinical study which presented 
that 8% to 9% of diabetic patients developed DR [1]. Taking into account this 
low proportion, also supported by other studies, physicians concluded that 
some patients could be safely screened every 2 or 3 years and it would be bet- 
ter to focus the use of resources on those patients with more risk to develop 
DR. 

The objective of this work to construct a Clinical Decision Support System 
(CDSS) to aid the clinicians to make a primary diagnosis and evaluate the risk 
of developing DR[3]. In this work we propose a CDSS based on an ensemble 
of decision trees called Fuzzy Random Forest (FRF). The goal of this model 
is to classify the new patients as healthy (no risk of DR) or sufferer (some 
sign of DR). 


2 The data 


The method proposed in the next section has been applied to data stored in 
the Electronic Health Records of patients which were methodically collected 
by the Ophthalmology Department of SJRU Hospital. The hospital provided 
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us 2323 records of diabetic patients. The dataset consists of 579 patients with 
DR (class 1) and 1744 records of healthy patients (class 0). 

To train and test the model we used the most relevant attributes, accord- 
ing to the previous study of the physicians [1]. Some attributes are numerical 
(e.g. Age and Body Mass Index), while the others are categorical (e.g. Sex). 
As the proposed model is based on fuzzy logic, we need to transform the 
input data into fuzzy sets. The numerical attributes are discretized into lin- 
guistic attributes. Each attribute is fuzzified into linguistic terms which are 
significative to the physicians. 


3 Methods 


The model proposed in this work is known as fuzzy random forest (FRF) 
which consists of a bunch of fuzzy decision trees (FDTs) as classifiers. Each 
branch of these classifiers is a rule that makes a decision. Various bootstrap 
samples of the training dataset have been utilized to assure the variety in 
constructing the fuzzy decision trees. To raise the diversity, random selection 
of a subset of the total attributes to split each node has also been used. 


3.1 Random Forest Construction 
The following are the essential steps to generate a random forest : 


1. Pick random samples of the examples for training (bootstrap). The size of 
the bootstrap must be around 2/3 of the training dataset and the balanced 
distribution in the bootstrap must be taken into account. 

2. Utilize each bootstrap to build a fuzzy decision tree (Section 3.2). To 
determine a new splitting of a tree node through the tree building process, 
a random subset of the attributes will be taken with size equal to y. 

3. Repeat steps 1, 2 until all fuzzy decision trees (n) have been built to 
comprise a random forest. 


3.2 Fuzzy Decision Tree Induction 


Many fuzzy decision trees induction methods have been proposed in the 
literature. The fuzzy decision tree induction algorithm proposed by Yuan and 
Shaw has been used in this work [4]. 

The steps of the induction process are the following: 


1. Generate a subset of attributes of size 7, then select the best attribute for 
the root node v: the one with the smallest ambiguity. 

2. Create a new branch for each of the values of the attribute v for which we 
have examples with support at least a. 

3. Calculate the truth level of classification with a branch into each class. 


Title Suppressed Due to Excessive Length A7 


4. If for at least one class, the truth level of classification is higher than 6, 
terminate the branch with a label corresponding to the class with the 
highest truth level. 

5. If the truth level is smaller than ( for all the classes, check if there is a 
new node that can reduce the classification ambiguity. 

6. If there is more than one attribute with lower classification ambiguity, 
select the attribute with the smallest classification ambiguity with 
the accumulated evidence as a new decision node from the branch. 
Repeat from step 2 until no further growth is possible. 

7. If there are no attributes that can reduce the classification ambiguity, 
terminate the branch as a leaf with a label corresponding to the class with 
the highest truth level. 


The parameters which used in the induction process are: 


The significance level (a) is used to filter if the evidence is relevant enough 
or not. If the membership degree of the evidence is lower than a , it is not 
used in the induction process. 

The truth level threshold (3) determines the minimum truth level of the 
conclusions obtained by the rules. 
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Fig. 1: Classification of a new observation in the Fuzzy Random Forest. 


3.3 Random Forest classification 


In a random forest, for an observation each rule of each tree gives a pre- 
dicted class. Many techniques exist to obtain the final decision of the random 
forest. The following are the steps of the method used in this work to classify 
an observation (see Figure 1): 
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1. The new observation data is fed into each root node of the forest trees. 
Each branch (i.e. rule) of a tree gives an inference consisting of a class 
label of the leaf Lxy and a certain membership degree pu. 

2. Aggregate all the inferences of the leaves of each tree to get one final deci- 
sion of the tree (class label). The Mamdani inference procedure is utilized 
to predict the final class from each FDT: 1) Calculate the satisfaction de- 
gree of a rule using the t-norm minimum; 2) Calculate the membership to 
the conclusion class 4 by multiplying the degree of support of the rule by 
the satisfaction degree; 3) Aggregate the memberships for the same class 
using the t-conorm maximum. To obtain a single final decision of a tree for 
an observation, we compare the highest membership degrees of the classes 
for this observation in order to check if the difference of the membership 
values is large enough to choose a final class. If the difference of the mem- 
bership degrees is higher than a given threshold 6, then choose the class 
label of the highest membership value; otherwise, the class is ” Unknown”. 

3. To obtain the final inference of the random forest, count the number of 
trees predicting each class. Choose the class label with the majority of 
votes if the difference between the two majority classes is higher or equal 
than a given threshold 62; otherwise, the class is ” Unknown”. 


To ensure the observations are classified in a specific class with enough 
support, we defined two parameters, 6; and 62, in order to detect the cases 
where an observation belongs to different classes with similar memberships or 
a similar number of votes. In these cases, the observation is not classified (i.e. 
it is labelled as ” Unknown”). 
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Abstract. Searchable encryption schemes allow users to outsource a dataset in an 
encrypted form while preserving the ability to remotely and privately query over it. In 
this work we propose different techniques for searchable encryption that achieve range 
queries on two-dimensional geo-referenced data. The proposed techniques improve 
previous works from an efficiency and from a security point of view. 


1 Introduction 


The cloud computing paradigm offers very convenient data storage and 
computation services at a low cost, thus providing an attractive alternative 
to physical storage and self-managed servers. Nevertheless, even though cloud 
computing leads to many economical and functional benefits, the action of 
leaving data at the hands of an external cloud service provider poses many 
security and privacy concerns. 

One way to address the security concerns that arise from the process of out- 
sourcing data to the cloud is providing users with user-centered cryptographic 
techniques. However, it is not convenient to outsource encrypted data by using 
traditional encryption techniques, since any operation over the dataset must 
be carried out locally. To overcome this obstacle, alternative cryptographic 
schemes must be applied. 

In recent years there have been important advances in cryptographic tech- 
niques that allow to take advantage of the cloud benefits while securing the 
data. For example, two of these techniques are homomorphic encryption and 
order-preserving encryption, allowing for remote computations and ordering 
on encrypted data respectively. 

Searchable encryption [11,3,1,4,7] deals with the problem of remotely 
querying over encrypted data. By using searchable encryption schemes, it 
is possible to outsource a dataset in an encrypted form, while preserving the 
searching functionality by letting users be able to send encrypted queries to 
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the cloud. In this way, users can remotely and securely query over encrypted 
data and retrieve the segment of the outsourced dataset satisfying the query 
conditions. 


2 Searchable Encryption for Geo-Referenced Data 


Our aim is to provide searchable encryption schemes that enable a client 
to delegate an encrypted version of a geo-referenced dataset to a semi-trusted, 
honest-but-curious server, in such a way that searching capabilities over the 
encrypted data are preserved. 

In our setting, the client first delegates an encrypted version of its dataset 
to a server. Such a dataset consists of a collection of documents, each of 
which is attached to a particular geographical point. Afterwards, the same 
client may want to retrieve a subset of the outsourced dataset. By generating 
an encrypted query, it is able to recover the outsourced documents lying inside 
a chosen rectangular location. 

Based mainly in the works by Shi et al. [10] and by Faber et al. [6], we 
develop four techniques for searchable encryption achieving two-dimensional 
range queries over encrypted data. These techniques show different efficiency 
and security trade-offs. The provided solutions are also general, in the sense 
that they make use of an arbitrary underlying keyword searchable encryption 
scheme. By changing this underlying scheme, different efficiency and security 
measures can be achieved. 

We analyze the trade-off between performance, security and communica- 
tion overhead of the presented options by considering the scheme by Cash 
et al [4] as the underlying searchable encryption scheme. Our solutions take 
advantage of the Boolean search and inverted index properties of [4]. 

As a novel approach with respect to previous works, we build on alternative 
combinatorial structures to lower the leakage of the schemes, thus improving 
security at the cost of increasing the query size and the search time. We 
also present a technique based on over-covers [6] that notably reduces the 
communication cost and the leakage of the queries at the expense of increasing 
the false-positive rate. 

The proposed results have been presented at the 15th IFIP Annual 
Mediterranean Ad Hoc Networking Workshop (Med-Hoc-Net 2016). 
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1 Introduction 


Diabetic Retinopathy (DR) is an emerging blindness world epidemic due 
to its generative upgrowth of new blood vessels that nourish the retina. These 
vessels are responsible of an increment in the blood glucose level. As result, 
dilated small blood vessels (microaneurysms) and their rupture are the source 
of intra-retinal hemorrhages and fluid leaking composed by lipoproteins and 
lipids (exudates). 

The contribution of this work is to present a comparative between Convex- 
ity Shape Prior and Grabcut algorithm for OD segmentation in color fundus 
images. Hence, the proposed algorithms are shown in Section 2. And finally, 
a brief Section 3 exposes the main conclusion. 


2 Methodology 


The proposed OD segmentation methods are based on the Discrete Convex- 
ity Shape [1] and Grabcut [5] algorithm. The general procedure is composed of 
three stages: (1) Preprocessing, which implicates the RGB green channel and 
the CIELAB lightness; (2) Segmentation of the main blood vessels located on 
the OD; (3) OD segmentation applying the two algorithms proposed. Finally, 
an accurate analysis of the results will be performed as well as a comparison. 
The next Figure 1 shows the described stages. 


2.1 Preprocessing 


An accurate OD segmentation needs to avoid false positives generated by 
the presence of blood vessels. Hence, we previously apply the contrast-limited 
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Fig. 1: Flow Chart of the proposed OD segmentation method. 


adaptive histogram equalization (CLAHE) algorithm to compensate the non- 
uniform lighting effect. Then, in order to enhance blood vessels and make 
them more distinguishable, the brightness preserving dynamic fuzzy histogram 
equalization (BPDFHE) [4] method is applied on the previous response. Once 
we have the resulting image, CLAHE is finally executed another time. 


2.2 Morphological processing 


We propose a similar methodology as developed in [3] to obtain the region 
of interest where OD is located (following the steps outlined in Algorithm 
1). In Figure 2 illustrates an example of fundus images that have been pre- 
processed through CLAHE in the presence of exudates. Note that, their ap- 
pearance is similar to OD. Hence, it can be appreciated how bright structures 
(exudates) have been eliminated progressively until get the suitable region of 
interest. 


2.3 Blood vessels segmentation 


This section explains how the proposed bank filter composed by two Average 
filters and one Gaussian filter match with a Gabor Wavelet filter achieving effective 
detections. Concretely, the designed bank filter is supported in [2], where it is defined 
as a concatenation of the Average - Gaussian - Average sub-filters configured with 
three different kernel dimensions (3 x 3, 9 « 9 and 69 z 69) and Gabor filters have 
been mostly used thanks to its performance as feature extractor. In order to detect 
the whole structure, Gabor Wavelet filter has been generated from a mother wavelet 
and configured with an arrangement of 8 orientations and 5 scales. 
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(b) (c) (d) 
(e) (f) (g) (h) 


Fig. 2: Morphological procedure to obtain the OD region. (a) G channel. 
(b) Preprocessed image through CLAHE algorithm. (c) to (g) Morphological 
opening and closing operations with radius equivalent to 4, 8, 12, 16 and 
20 pixels, respectively. At the end, (h) represents the obtained OD region of 
interest. 


Algorithm 1 Morphological procedure to detect OD region 


Require: GoLAHE 
Ensure: ODpetection 
1: StructuralElement < Disc 
2: Radius + 4 
3: Getosing am GCLAHE 
4: while OD petection = true do 


5: Gopening  tmopen( Geiosing, StructuralElement, Radius) 
6:  Getosing + imclose( Gopening, StructuralElement, Radius) 
7: Gpw < Thresholdotsu {Number of pixels determined the ODpetection is true 
or false} 
8: if ODpetection = true then 
9: return true 
10: end if 


ll: Radius + Radius +4 {Increasing factor by 4} 
12: end while 


2.4 Optic disk segmentation 


Regarding this phase, two different algorithms are proposed to accomplish OD seg- 
mentation. Both of them are effective combinatorial optimization techniques based on 
prior information such as shape, color separation and geometric interactions. Firstly, 
we adapt the new Convexity Shape Prior algorithm [1] to be used on medical image 
applications. Next, the obtained results are compared with the traditional and itera- 
tive Grabcut approach [5]. In Figure 3 shows a set of segmentation results validated 
according to the doctor’s experience. 
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Fig. 3: OD segmentation result illustrated by blue boundaries. The first row 
contains the results of Convexity Shape Prior algorithm and the second row 
corresponds to Grabcut algorithm. 


3 Conclusions 


An analysis of two algorithms (Convexity Shape Prior and GrabCut) are intro- 
duced for OD interactive segmentation. During the preprocessing stage, blood vessels 
are enhanced with both CLAHE and BPDFHE methods to improve their contrast. 
Next, in order to eliminate the presence of blood vessels inside the OD structure, 
a new matching filter has been designed. In addition, it is possible to increase the 
OD accuracy by interpolating the statistical color information of the neighbors using 
an inpaint NaN algorithm. At the end, the OD segmentation algorithms are applied 
without any dependence related to a predefined shape. 
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Abstract. Clinical guidelines are valuable instruments to record and transmit avail- 
able evidence based knowledge. Several medical informatics technologies have been 
developed to merge computer knowledge representation relative to diseases active in 
a multimorbidity. We propose a classification of the current available technologies 
and assess their strengths and weaknesses. 
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1 Background 


Progress in healthcare and medical treatment have contributed to a large 
segment of the population living longer with associated chronic conditions [1]. 
This phenomenon is associated to medical conditions involving co-morbidity 
and multimorbidity. 


2 Introduction 


Multimorbidity is the coexistence of multiple long term diseases in the same 
individual at the same time, with none of the diseases being more prominent 
than the others [2,1] as figure 1 depicts. Multimorbidity doesn’t have well- 
defined criteria for medical diagnosis [1]. The presence of multimorbidity leads 
to uncovering others problems closely related to multimorbidity such as dis- 
ability (difficulty of lack of independence), frailty (state of vulnerability) and 
patient complexity (dealing with medical, social and behavioral factors) [1]. 

Co-morbidity, on the other hand, is defined as the coexistence of secondary 
diseases associated to a primary disease or index, as figure 2 shows. Treatment 
is driven by the primary disease, but it is adapted to include treatment for 
the secondary diseases [2,3]. 
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Fig. 1: Diagram of multimorbidity. Fig. 2: Diagram of co-morbidity. 


Formal languages are suitable instruments for capturing knowledge embed- 
ded in clinical practice guidelines [1]. Thus, from a formal language perspec- 
tive knowledge acquisition and machine learning are suitable instruments for 
capturing knowledge contained in clinical practice guidelines. Medical practi- 
tioners utilize computers that use computer structures capable of capturing 
medical knowledge contained in clinical practical guidelines [4]. 

Specifically, the knowledge acquisition approach evolves from clinical ex- 
perts interpreting clinical guidelines into computer structures done by knowl- 
edge engineers [4,5]. Knowledge acquisition is characterized by being evidence- 
based, but time consuming to implement. 

The machine learning approach uses the information in databases related 
to the management of multimorbid patients. Then, it generalizes past individ- 
ual experiences by constructing computer structures [6,7,8,9]. Developing and 
building computer structures for managing multimorbid patients is a promis- 
sory new field of active research [10]. 

According to Abidi et al [11] and Jafarpour [12] the combination of knowl- 
edge in the knowledge acquisition approach happens at specific places along 
the path of a transformation process that begins at clinical guidelines and 
goes all the way to the end as computer structures. 

Abidi [13] suggested an additional proposal claiming that knowledge of the 
diseases of multimorbid patients can be combined at modeling level and at 
the execution level. The former is related to computer structures generated for 
each disease affecting multimorbid cases are combined into a single computer 
structure ready to be use by medical practitioners on multimorbid cases. The 
later takes place when each one of the computer structures of the diseases of 
multimorbid cases are executed and their results are integrated into a single 
guideline. 
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Jafarpour extended the modelling and execution level approach by adding 
two additional points of combinations. The first point is at the guideline level 
and the second point is at the computerization level [12]. 


3 Objectives 


To perform an extensive literature review of technologies for computer- 
based health-care combination to manage multimorbid patients. Details on 
the technologies and their analytical comparison related to their strength, 
weaknesses and maturity will be explored. 


A Classification of Methods 


Knowledge integration. The knowledge integration method (KIM) is a gen- 
eral method of classification that includes ontology merging, logic and con- 
straint satisfaction, transition fitting. All of these approaches have in common 
the explicit representation of conflicts experienced by single-disease guidelines. 

Treatment Integration method (TIM) involves drug integration checking 
that relates the pharmacy knowledge on treatments to detect drug interactions 
(side effects) [14,15,16,17]. Another approach is the drug interaction resolu- 
tion [18] that involves the implementation of rules aimed at solving conflicts 
upon detection and suggest physicians a possible conflict-free treatment. 

Clinical pathway pattern discovery (CPPD) uses clinical logs to infer path- 
ways patterns resulting from joint probabilistic models. 

State decision-action induction (SDAI) extracts clinical algorithms from 
episodes of care. CPM, CPPD and SDAI have in common that knowledge is 
extracted from past clinical interventions. 

Data integration approach deals with analysis of episodes of care of mul- 
tiple patients to discover hidden patterns. Clinical process mining (CPM) 
analyzes clinical logs on sequential treatments on patients. The analysis of 
these logs result in multimorbid treatment models after the identification of 
common sequences, concurrences and branching of clinical actions mined in 
the process. Additional technologies in this area are dynamic programming 
optimization and latent Dirichlet allocation (LDA) combined with Gibbs sam- 
pling [19]. Riafo et al [20] proposed a four-step process to obtain clinical 
algorithms that generalize treatments described in sets of episodes of care. 


5 Discussions 
Currently, clinical guidelines are the best source for dealing with diagnosis, 


prognosis and treatment. The information in clinical guidelines is very specific 
and comes from controlled situations/experiments. Therefore, it is rich in 
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internal validity, but low in external validity. In real world cases, controlled 
situations do not occur due to heterogeneity of multimorbid patients and high 
cost of episodes of care performed on controlled condition settings. 
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